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ABSTRACT 


An estimate of the nonrandom vector, 3, of parameters is obtained 
in the linear model Y = X3 + c , where e is an unobservable random 

vector of disturbances and is assumed to satisfy E(e) =0 (the 

T 

zero vector) and E(ee ) = V, with V assumed unknown. The esti- 
mate obtained is the one which yields maximal similarity to the 
sample Y^^, Y2» **•» via the Sebestyen similarity function. 
Under the normality assumption, the resulting estimate is seen 

to bo an unbiased estimate and justification given for selecting 
the maximum likelihood estimate for V in the Gauss-Markov esti- 
mate for B. 


1 . INTRODUCTION 

Consider the linear model Y - X3 + e, where Y is an n x 1 observ- 
able random vector, X is an n x m matrix of fixed elements and 
rank(X) - m < n, 3 is an m x i nonrandom vector of parameters 
to be estimated and e is an unobservable random vector of dis- 
turbances with e assumed to satisfy E(e) == 0 with unknown 
variance-covariance matrix E(ee ) =• V. It is well known that xn 
case V is known (up to at least a scalar multiple) , the Gauss- 
Markov theorem [1] applies and the best linear unbiased estimate 
of 3 is given by 


/ 


( 1 ) 




s « (x'^v“^x) X^v“^Y 

Other authors [2, 3, 4] have considered the problem of obtaining 
optimal estimates for 0 when V is unknown. Rao [4] showed that 
the estimate of 0 obtained by merely substituting an estimate 

A 

V for V in equation (1) is not necessarily best; in particular/ 
it may be possible to use known or inferred knowledge of the 
covariance V to obtain an estimator with better characteristics. 
Born [ 2 ] has written a recursive estimator when V is not known 
but is assumed to be blcok diagonal with equal diagonal blocks- 
McElroy [ 3 ] obtained necessary and sufficient conditions on V 
for equation (1) to be equivalent to the least-squares solution 



( 2 ) 


In this paper, we assume that the only available information is 
that contained in a sample Y 2 , •••, Yjj, and an estimate, 0, 
of 0 is obtained which results in maximal similarity to the given 
sample via the Sebestyen [5] similarity function. The resulting 
estimate appears in the form of equation (1) , with V replaced by 
the standard (and in the normal theory case, the maximum likeli- 
hood) estimate of the variance-covariance matrix. 


2. THE SEBESTYEN INTERSET SIMILARITY FUNCTION 

If r” denotes Euclidean n-space and P is the class of finite 
sequences of sample observations in R*' (i.e., W,Z C P, provided 
W = |W^, W 2 , ***» Wjjl and Z == jZj^, Z 2 , Zj^} where 

W^,Zj C r" for i = 1, 2, N; j - 1, 2, • • • , M) , the 

Sebestyen [s] similarity function is defined as follows. 

Definition: If W,Z CP with N and M elements, respectively, and 

if A is any m x n matrix, define the function S^; P x P‘>-Rq, where 
Rq is the set of nonnegative real numbers, by 




(3) 


S^(W,Z) 


1 

NM 


N M 

Z Z 

i=l j*l 


(W^ - Zj)VA(W^ - Zj) 


(The superscript T denotes the transpose.) Given a transforma-* 
tion A, S^(W,Z) is a measure of the similarity of the two samples 
W and Z in the transformed space (i.e., the resulting space after 
transforming by A) , and if W and Z are random samples from 
populations and ^2, respectively, then S^(W,Z) may be con- 

sidered as a measure of the similarity of to 'n-2. 


If W,Z C P have sample variance-covariance matrices and V 2 , 
respectively, that is, 

= s Z ("i - 

i=l 

and Tr denotes the trace operator [and S(W,Z) A Sj(W,Z)], 
then the properties below are easily verified. 

Properties: 

1. S^(W,Z) = Tr[A(Vj^ + V2)A^] + (W - ¥)\^A(W -Z) . 

2. S(W,W) = 2Tr(Vj^) and S^(W,W) = 2Tr|AV^A^). 

3. S^(W,Z) = S^(Z,W). 

4. S^(W,Z) > 0 for every W,Z C P and for each m x n matrix. 

5 . If V e then S^(W,v) = S^(W,{v}) = Tr^AV^A^^j 
+ (W - V)"^A^A(W - V) . 

6. If W == jWj^, W 2 f ’•*, and Z = jZ^^, Z 2 , Zj^(, then 

k Z <«'">• 

J=1 
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The Sebestyen decision rule is to classify an unknown u as belong- 
ing to category W provided 

S^(W,{u}) < S 3 (Z,{u>) (5) 

where A and B are preselected transformations for W and Z, 
respectively. Consequently, the function 

f (u) = Sg(Z,{u}) - S^(W,{u}) is the discriminant function for 
the Sebestyen decision rule, with classification of u into W or Z 
being accomplished by noting the sign of f (u) ; that is, u is 
classified as belonging to W or Z depending on whether f (u) > 0 
or f (u) < 0, respectively. 

3. A TRANSFORMATION TO MINIMIZE THE INTRASET DISTANCE 


Thus far, no specifications have been placed on the transforma- 
tion A; however, if A is an orthogonal matrix, the transformation 
results in a rotation of the original space whereby distances and, 
hence, angles are preserved. If the determinant of A [Det(A)] is 
1, A is a volume-preserving transformation. The transformation 
of interest in this paper is specified in Theorem 1, the proof 
of which is dependent on the following well-known relationship 
between the arithmetic and geometric mean. 


Lemv^a 1: If d^ > 0 for i = 1, 2, •••, n, then 


, n / n \ 

"(n ^i) 

i=l \i=l / 


1/n 


( 6 ) 


with equality holding if and only if = d 2 - ••• “ 


Thccii>om 1: Under the condition Det(A) = 1 and is positive 

definite, an n x n matrix A minimizes S^(W,W) [that is, the 




similarity of a set with itself] if and only if AVj^A^ * XI, 
where X = [Det (V^) ] and is the sample variance-covariance 
matrix of W specified in equation (4) . 

Proof: If B is any n x n matrix with Det(B) 1 and A is the 

matrix specified in the hypothesis, from Property 3 and the fact 
that AVj^a"^ = XI, we have Sg(W,W) - S^(W,W) = 2Tr|BVj^B^j - 2nX. 

Letting U be the orthogonal matrix such that UBV^^B^U^ = D, where 
D is diagonal, then 

2Tr(BVj^B^) - 2nX = 


But equation (7) is nonnegative, by Lemma 1, with equality hold- 
ing if and only if d^^ = d 2 = *** ~ which case ^^1®^ " 

which was to be demonstrated. Note that Aw exists if is 
positive definite. Indeed Aw = EV, where E is the matrix whose 

A , 

columns are the eigenvectors of and V is a diagonal matrxx 
whose ith diagonal element is X/3-, i = 1, 2, •••, n, where 
3^ = ith eigenvalue of V^. 

Theorem 1 associates with each sample W € P, a transformation, 
Apj, with the property that A^ causes W to cluster in a spherical 
fashion after transformation with uncorrelated variates having 
equal variances. If W and Z are samples from populations and 
^2 and if A and B are selected such that 

A = ! B = (8) 


2Tr (uBVj^bV) 
2Tr(D) - 2nX 


- 2nX 


2n 


K E "i - ( fi 


il/n 


L i-l 


,i=l 


(7) 
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where - [Det (v^^) 1 Xg - [Det(V 2 ) 1 and Aq and Bq are 
determined independently for W and Z, respectively, by Theorem 1, 
the effect is that of a normalization of the intraset similarity 
in that not only are the intraset distances minimal but 
S^(W,W) ■ Sq(Z,Z) ■ 2n as well (i.e., the normalization gives 
each intraset similarity the same value). Moreover, if instead 
of a threshold of zero in the Sebestyen decision rule [see 
eq. (5)1, we choose the threshold 

Det (V. )p„ 

T = In — - (9) 

Det (V 2 )pj^ 

the resulting decision rule is the Bayes maximum likelihood 
decision rule [6] (when the population distributions are Normal) 
with equal costs of misclassification and a priori probabilities 
and p 2 for and iT 2 , respectively. 

4. THE ESTIMATE FOR g 

In the weighted least-squares procedure, an estimator of 3 was 

rn ..1 

selected which minimized (Y - Xg) V (Y - Xg) . The optimal 
estimate is specified in equation (1) and the significance of 
such an estimator is that of being able to predict, or adjust in 
some applications, Y to a given matrix X. Since V is ordinarily 
unknown, we proceed as described below. 

Collect N sample values? denote this sample by 

W = 1 ^ 1 ' ^2' **"' sample variance-covariance 

matrix by 

V - 1 2 (’'i - ‘i<» 

i=l 

If A is selected such that 

ava'^ = XI (11) 




where 


X » [Det(V)]^/" (12) 

then Theorem 1 guarantees that the similarity of W with itself 
is a minimum after transformation by A. For prediction or adjust- 
ment purposes, what we now want to do is to select the vector 
Z = X3 which, after transformation, is more similar to the repre- 
sentative sample W than any other such vector. Equivalently, we 
want to select 3 such that S^(W,{Z}) is a minimum where Z = X3 
and A satisfies equation (11) . 


Theorem B : The value of 3 which minimizes S^(W,X3) is given by 


(x^v"^x) 


-1 


T'' — I— 
XV Y 


(13) 


where V is the sample variance-covariance matrix of W, A is the 
transformation specified in Theorem 1, and Y is the sample mean 
of W. 


Ppoof; From Property 6 of S^, 

S^(W,X3) = Tr(AVA^) + (Y - X3) Va(Y - X3) (14) 

Differentiating this expression with respect to 3, equating to 0, 
and solving for 3 yields 

3 = (x^A^Ax) xVay (15) 

However, from the condition that AVA*^ = Al 

A^A = (1/A)v“^ (16) 

which results in equation (8) after substitution into equa- 
tion (9) , which was to be demonstrated. 
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Under the normality assumption on Y [i.e., y MVN(X0,V)] where 
V is unknown, V and Y are independent? therefore 

E(3) *= e[(x'^v''^x) X^V’^y] 


E 


[(x*^v“^x) x'^v"'^]e(y) 


= e[(x^v"^x) x'^v"^]x3 
= e[(x'^v"^x) 


= 3 

Consequently, 3 is unbiased under these conditions. 


(17) 


5 . SUMMARY 

A 

An estimate, 3, of 3 in the linear model Y = X3 + e was obtained 

A 

such that X3 yielded maximal similarity to the sample Y^^, Y 2 , 

•••, Yj^ via the Sebestyen similarity function. The unobservable 

random error term, e, was assumed to satisfy E(e) = 0 (the zero 

T 

vector) and E(e£ ) = V, with V assumed to be unknown. The 
resulting estimate is seen to be in the same form as the standard 
Gauss-Markov estimate 

3 = (x'^v"^x) x'^V'^Y 

except V is replaced by the standard (and under the normality 
assumption, the maximum likelihood) estimate of the variance- 
covariance matrix. 


6 . 
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